Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set
نویسندگان
چکیده
Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-rst search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent item-sets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down directions. The main search direction is still bottom-up but a restricted search is conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. As a very important characteristic of the algorithm, it is not necessary to explicitly examine every frequent itemset. Therefore it performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore speciies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.
منابع مشابه
A comprehensive method for discovering the maximal frequent set
The association rule mining can be divided into two steps.The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold.The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the a...
متن کاملPincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set
Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up, breadth-first search direction. The computation starts from frequent 1-itemsets (the minimum length frequent itemsets) and continues until all maximal (lengt...
متن کاملDiscovering Maximal Frequent Item set using Association Array and Depth First Search Procedure with Effective Pruning Mechanisms
The first step of association rule mining is finding out all frequent itemsets. Generation of reliable association rules are based on all frequent itemsets found in the first step. Obtaining all frequent itemsets in a large database leads the overall performance in the association rule mining. In this paper, an efficient method for discovering the maximal frequent itemsets is proposed. This met...
متن کاملAn Efficient Algorithm for Mining Multilevel Association Rule Based on Pincer Search
Discovering frequent itemset is a key difficulty in significant data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. The problem of developing models and algorithms for multilevel association mining poses for new challenges for mathematics and computer science. In this paper, we present a model of mining multilevel association rules whi...
متن کاملMax-Miner Algorithm Using Knowledge Discovery Process in Data Mining
Discovering frequent item sets is an important key problem in data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up, breadth-first search direction. The computation starts from frequent itemsets (the minimum length frequent itemsets) and continues until all maximal (lengt...
متن کامل